average precision
Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection?
Video understanding relies on accurate action detection for temporal analysis. However, existing mainstream methods have limitations in real-world applications due to their offline and closed-set evaluation approaches, as well as their dependence on manual annotations. To address these challenges and enable real-time action understanding in open-world scenarios, we propose OV-OAD, a zero-shot online action detector that leverages vision-language models and learns solely from text supervision.
Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation (Supplementary Material)
Differently, our unsupervised multi-body task requires the model's ability to handle part-level local equivariance, Figure 1: Structure of our feature extractor based on EPN. "EPNConv" is the SE(3)-equivariant convolution proposed in the vanilla EPN network. Part-level SE(3)-equivariance is desirable for motion analysis, especially rotation estimation. Song and Y ang utilized the methodology proposed by Choy et al . All other objects were considered part of the background.
Maximization of Average Precision for Deep Learning with Adversarial Ranking Robustness
This paper seeks to address a gap in optimizing Average Precision (AP) while ensuring adversarial robustness, an area that has not been extensively explored to the best of our knowledge. AP maximization for deep learning has widespread applications, particularly when there is a significant imbalance between positive and negative examples. Although numerous studies have been conducted on adversarial training, they primarily focus on robustness concerning accuracy, ensuring that the average accuracy on adversarially perturbed examples is well maintained. However, this type of adversarial robustness is insufficient for many applications, as minor perturbations on a single example can significantly impact AP while not greatly influencing the accuracy of the prediction system. To tackle this issue, we introduce a novel formulation that combines an AP surrogate loss with a regularization term representing adversarial ranking robustness, which maintains the consistency between ranking of clean data and that of perturbed data. We then devise an efficient stochastic optimization algorithm to optimize the resulting objective. Our empirical studies, which compare our method to current leading adversarial training baselines and other robust AP maximization strategies, demonstrate the effectiveness of the proposed approach.
Image2Net: Datasets, Benchmark and Hybrid Framework to Convert Analog Circuit Diagrams into Netlists
Xu, Haohang, Liu, Chengjie, Wang, Qihang, Huang, Wenhao, Xu, Yongjian, Chen, Weiyu, Peng, Anlan, Li, Zhijun, Li, Bo, Qi, Lei, Yang, Jun, Du, Yuan, Du, Li
Abstract--Large Language Model (LLM) exhibits great potential in designing of analog integrated circuits (IC) because of its excellence in abstraction and generalization for knowledge. However, further development of LLM-based analog ICs heavily relies on textual description of analog ICs, while existing analog ICs are mostly illustrated in image-based circuit diagrams rather than text-based netlists. Converting circuit diagrams to netlists help LLMs to enrich the knowledge of analog IC. Nevertheless, previously proposed conversion frameworks face challenges in further application because of limited support of image styles and circuit elements. Up to now, it still remains a challenging task to effectively convert complex circuit diagrams into netlists. And a hybrid framework, named Image2Net, is proposed for practical conversion from circuit diagrams to netlists. The netlist edit distance (NED) is also introduced to precisely assess the difference between the converted netlists and ground truth. Based on our benchmark, Image2Net achieves 80.77% successful rate, which is 34.62%- 45.19% higher than previous works. Specifically, the proposed work shows 0.116 averaged NED, which is 62.1%-69.6%